EN FR
EN FR


Section: New Results

Statistical learning methods for high-dimensional data

  • Genuer R, Poggi J-M, Tuleau-Malot C, Villa-Vialaneix N. Random Forests for Big Data. Big Data Research, 9 (2017). [18]

    Addresses the analysis of Big Data with Random Forests, review of existing algorithms, simulation study and recommandations.

  • Agniel D and Hejblum BP, Variance component score test for time-course gene set analysis of longitudinal RNA-seq data, Biostatistics, 18(4):589–604, 2017.[16]

    We propose tcgsaseq, a principled, model-free, and efficient method for detecting longitudinal changes in RNA-seq gene sets defined a priori. Applied to both simulated data and two real datasets, tcgsaseq is shown to exhibit very good statistical properties, with an increase in stability and power when compared to state-of-the-art methods

  • Hejblum BP, Alkhassim C, Gottardo R, Caron F, Thiébaut R. Sequential Dirichlet Process Mixtures of Multivariate Skew t-distributions for Model-based Clustering of Flow Cytometry Data, preprint on ArXiv. [39]

    We propose to use a Bayesian nonparametric approach with Dirichlet process mixture of multivariate skew t-distributions to perform model based clustering of flow-cytometry data, robustly estimating the number of cell populations from the data.